The opioid epidemic is a serious public health problem in the United States. The National Vital Statistics System provides data made available by the CDC and U.S. Department of Health regarding national mortality for drug overdoses over the past several years. The goal of this project is to perform statistical analyses to learn about the relationships between variables in the data set, geospatial analyses to understand the critical centers of the epidemic, as well as forcasting to predict future overdose trajectories.
Over the past 20 years, the United States has experienced an increase in the amount of deaths that can be attributed to drug overdoses. In particular, opioids have been at the heart of this problem, including both prescription medications and illegal opioids. Opioids are used to treat pain by interacting with opioid receptors in the body and brain. In addition to pain relief, opioids also elicit a feeling of pleasure. This combination of pain relief and pleasure provides some explanation for the addictive quality of opioid drugs. Over-prescribing by clinicians and access to illicit drugs have allowed this to become a severe epidemic that has shown little indication of slowing down, that both local and national agencies are attempting to address.
To understand the nature of the problem of drug overdose mortality, an interdisciplinary approach can provide the most insight. Firstly this is a public health issue, requiring the knowledge of clinicians, psychiatrists, and public health officials. This is also an economic and geographic issue, as socioeconomic status and location may have a large affect on drug overdose mortality. It is possible to use additional data sets from SAMHSA, NIDA and the DEA to integrate with the drug overdose mortality data will provide more insight for correlative analysis and trends regarding the opioid epidemic.
The primary data set used for this project was obtained from the publicly available data from the National Vital Statistics System. The link to the data download is here. The data analyzed is titled “VSRR Provisional Drug Overdose Death Counts” and contains provisional monthly counts for drug overdose deaths and total number of deaths (from mortality data) for all 50 states in the United States over the years 2015-2017. The data also includes information on the specific drug categories determined to be the cause of overdose for 18 states.
The main goal is to analyze the data with some exploratory analysis as well as graphical analysis to visualize data in a meaningful way to help understand the scale and severity of the opioid epidemic in the United States.
First, all necessary packages are loaded first to keep the code organized.
# load necessary packages
library(tidyverse)
library(sf)
library(RColorBrewer)
library(tigris)
library(tmap)
library(tmaptools)
library(grid)
library(leaflet)
library(forecast)
library(ggthemes)
library(png)Next, data needs to be loaded and cleaned for analysis. The dplyr package from tidyverse will be used to organize the data. First, we load in the two data tables that will be used for further analysis. The first dataset is the actual VSRR drug overdose data and the second dataset is population estimates for each state for the years of interest from the U.S. Census Bureau. Then, all of the necessary variables will be created to perform some geospatial analysis to look at the total number of drug overdoses by state for the years 2015, 2016, and 2017.
# load in data
data <- read.csv("VSRR_drug_od.csv", header=TRUE)
population <- read.csv("us-pop-est.csv", header=TRUE)
str(data)## 'data.frame': 10108 obs. of 10 variables:
## $ State : Factor w/ 53 levels "AK","AL","AR",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ State.Name : Factor w/ 53 levels "AK","AL","AR",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Year : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
## $ Month : Factor w/ 12 levels "April","August",..: 5 4 8 1 9 7 6 2 12 11 ...
## $ Indicator : Factor w/ 10 levels "Cocaine (T40.5)",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Data.Value : num 4034 4084 4101 4133 4196 ...
## $ Predicted.Value : int NA NA NA NA NA NA NA NA NA NA ...
## $ Percent.Complete : Factor w/ 2 levels "100","99.5+": 1 1 1 1 1 1 1 1 1 1 ...
## $ Percent.Pending.Investigation: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Footnote : Factor w/ 2 levels "","Underreported due to incomplete data": 1 1 1 1 1 1 1 1 1 1 ...
# need for subsequent analysis
drop.cols <- "State.Name"
# total deaths by state for each year
totalDeathsbyState <- data %>%
select(-one_of(drop.cols)) %>%
group_by(State, Year, Indicator) %>%
filter(Year %in% c("2015", "2016", "2017"),
Indicator == "Number of Deaths",
State != "US",
State != "YC") %>%
summarize(TotalDeath = sum(Data.Value))
totalDeathsbyState <- inner_join(totalDeathsbyState, population, by=c("State", "Year"))
totalDeathsbyState <- totalDeathsbyState %>%
mutate(deathRate = TotalDeath / Population * 100000)
drugODbyState <- data %>%
select(-one_of(drop.cols)) %>%
group_by(State, Year, Indicator) %>%
filter(Year %in% c("2015", "2016", "2017"),
Indicator == "Number of Drug Overdose Deaths",
State != "US",
State != "YC") %>%
summarize(TotalODDeath = sum(Data.Value))
drugODbyState <- inner_join(drugODbyState, population, by=c("State", "Year"))
drugODbyState <- drugODbyState %>%
mutate(ODrate = TotalODDeath / Population * 100000)Histograms were generated to look at both the total death rate and the overdose death rate.
# Histogram for Death Rate
ggplot(data=totalDeathsbyState, aes(x=deathRate)) +
geom_histogram(bins=20, aes(y=..density..), colour="black", fill="white") +
ggtitle("Total Death Rate (per 100,000)") +
geom_density(alpha=.2, fill="lightblue") +
facet_grid(~Year)# Histogram for Overdose Death Rate
ggplot(data=drugODbyState, aes(x=ODrate)) +
geom_histogram(bins=20, aes(y=..density..), colour="black", fill="white") +
ggtitle("Overdose Death Rate (per 100,000)") +
geom_density(alpha=.2, fill="lightblue") +
facet_grid(~Year)After grouping and filtering data, the data can be joined to make it easier for subsequent analysis. The percentage of overdose deaths per state per year can be calculated and added as a new column.
totalDeathsandOD <- inner_join(drugODbyState, totalDeathsbyState, by=c("State", "Year", "Population"))
percentOD <- totalDeathsandOD %>%
mutate(percent = ((TotalODDeath / TotalDeath) * 100))A histogram can be used to get an idea of the distribution of the overdose percentages for the years 2015-2017.
ggplot(data=percentOD, aes(x=percent)) +
geom_histogram(bins=20, aes(y=..density..), colour="black", fill="white") +
ggtitle("Percentage of All Deaths Attributed to Overdose") +
geom_density(alpha=.2, fill="lightblue") +
facet_grid(~Year)For 18 states, there is data on the specific drug category the overdose deaths can be classified into. To visualize this, we can generate a stacked bar plot for each year to show the differences between each year.
# Specific drug categories
specificDrugs <- data %>%
select(-one_of(drop.cols)) %>%
group_by(State, Year, Indicator) %>%
filter(Year %in% c("2015", "2016", "2017"),
Indicator %in% c("Heroin (T40.1)",
"Natural & semi-synthetic opioids (T40.2)",
"Synthetic opioids, excl. methadone (T40.4)",
"Methadone (T40.3)",
"Cocaine (T40.5)"),
State != "US",
State != "YC") %>%
summarize(TotalDeath = sum(Data.Value))
allOD <- inner_join(drugODbyState,
specificDrugs, by=c("State", "Year"))
ggplot(allOD, aes(x=State, group=Year)) +
geom_bar(stat="identity", aes(y=TotalDeath, fill=Indicator.y)) +
facet_grid(Year ~ .) +
theme_bw() +
scale_y_continuous("Number of Deaths",
labels = c("0", "10k", "20k", "30k", "40k")) +
scale_fill_discrete(name="Drug Category",
labels=c("Cocaine", "Heroin", "Methadone",
"Natural Opioids", "Synthetic Opioids"),
breaks=c("Cocaine (T40.5)",
"Heroin (T40.1)",
"Methadone (T40.3)",
"Natural & semi-synthetic opioids (T40.2)",
"Synthetic opioids, excl. methadone (T40.4)")) +
ggtitle("Breakdown of Drug Overdose Deaths by Drug Category")These are interesting plots as they convey several types of information. First, from 2015 to 2017 there has been an overall increase of total drug overdose deaths for all states. Focusing specifically on the opiods, heroin deaths seem to have increased each year for most states. Natural opioid deaths appear to remain relatively consistent over the 2015-2017 period. The most striking difference is the increase of synthetic opioid deaths, which have nearly doubled for some states between 2016 and 2017.
To address the goal of geographical analysis, loading in the coordinate data is necessary. Our map shapefile was downloaded from the US Census Bureau. The package sf is used to import the shapefile as an sf object.
# importing state shapefile as sf object
US.sf <- st_read("cb_2017_us_state_20m/cb_2017_us_state_20m.shp")## Reading layer `cb_2017_us_state_20m' from data source `/Users/kateslovik/BMIN503_Final_Project/cb_2017_us_state_20m/cb_2017_us_state_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 52 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
str(US.sf)## Classes 'sf' and 'data.frame': 52 obs. of 10 variables:
## $ STATEFP : Factor w/ 52 levels "01","02","04",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ STATENS : Factor w/ 52 levels "00068085","00294478",..: 51 22 23 17 27 28 29 30 16 19 ...
## $ AFFGEOID: Factor w/ 52 levels "0400000US01",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ GEOID : Factor w/ 52 levels "01","02","04",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ STUSPS : Factor w/ 52 levels "AK","AL","AR",..: 1 5 6 8 14 15 13 18 19 21 ...
## $ NAME : Factor w/ 52 levels "Alabama","Alaska",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ LSAD : Factor w/ 1 level "00": 1 1 1 1 1 1 1 1 1 1 ...
## $ ALAND : num 1.48e+12 4.03e+11 2.68e+11 1.58e+08 2.14e+11 ...
## $ AWATER : num 2.78e+11 2.05e+10 1.18e+09 1.87e+07 2.39e+09 ...
## $ geometry:sfc_MULTIPOLYGON of length 52; first list element: List of 47
## ..$ :List of 1
## .. ..$ : num [1:13, 1:2] -173 -173 -173 -173 -172 ...
## ..$ :List of 1
## .. ..$ : num [1:37, 1:2] -175 -175 -175 -175 -175 ...
## ..$ :List of 1
## .. ..$ : num [1:39, 1:2] -177 -177 -177 -177 -177 ...
## ..$ :List of 1
## .. ..$ : num [1:17, 1:2] -178 -177 -177 -177 -177 ...
## ..$ :List of 1
## .. ..$ : num [1:19, 1:2] -178 -178 -178 -178 -178 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -179 -179 -179 -179 -179 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] -179 -179 -179 -179 -179 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -179 -179 -179 -179 -179 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] 179 180 180 180 180 ...
## ..$ :List of 1
## .. ..$ : num [1:14, 1:2] 179 179 179 179 179 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] 178 179 179 179 179 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] 178 178 178 178 178 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] 178 178 178 178 178 ...
## ..$ :List of 1
## .. ..$ : num [1:20, 1:2] 177 177 177 177 178 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] 174 174 174 174 174 ...
## ..$ :List of 1
## .. ..$ : num [1:12, 1:2] 173 174 174 174 174 ...
## ..$ :List of 1
## .. ..$ : num [1:20, 1:2] 172 173 173 173 173 ...
## ..$ :List of 1
## .. ..$ : num [1:77, 1:2] -134 -134 -134 -134 -134 ...
## ..$ :List of 1
## .. ..$ : num [1:63, 1:2] -134 -134 -134 -134 -134 ...
## ..$ :List of 1
## .. ..$ : num [1:45, 1:2] -135 -135 -135 -135 -135 ...
## ..$ :List of 1
## .. ..$ : num [1:78, 1:2] -137 -137 -136 -136 -136 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -152 -152 -152 -152 -152 ...
## ..$ :List of 1
## .. ..$ : num [1:8, 1:2] -154 -153 -153 -153 -153 ...
## ..$ :List of 1
## .. ..$ : num [1:121, 1:2] -155 -155 -155 -155 -154 ...
## ..$ :List of 1
## .. ..$ : num [1:14, 1:2] -155 -155 -155 -154 -154 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -156 -156 -156 -156 -156 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -157 -157 -157 -157 -157 ...
## ..$ :List of 1
## .. ..$ : num [1:8, 1:2] -157 -157 -157 -157 -157 ...
## ..$ :List of 1
## .. ..$ : num [1:24, 1:2] -160 -160 -160 -160 -160 ...
## ..$ :List of 1
## .. ..$ : num [1:19, 1:2] -161 -161 -161 -161 -160 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -161 -161 -161 -161 -161 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] -161 -161 -161 -161 -161 ...
## ..$ :List of 1
## .. ..$ : num [1:5, 1:2] -162 -162 -162 -162 -162 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -163 -163 -162 -162 -162 ...
## ..$ :List of 1
## .. ..$ : num [1:14, 1:2] -166 -165 -165 -165 -165 ...
## ..$ :List of 1
## .. ..$ : num [1:13, 1:2] -166 -166 -166 -166 -165 ...
## ..$ :List of 1
## .. ..$ : num [1:44, 1:2] -167 -167 -167 -167 -167 ...
## ..$ :List of 1
## .. ..$ : num [1:58, 1:2] -168 -168 -168 -167 -167 ...
## ..$ :List of 1
## .. ..$ : num [1:1244, 1:2] -168 -168 -168 -167 -167 ...
## ..$ :List of 1
## .. ..$ : num [1:33, 1:2] -169 -169 -169 -169 -169 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -170 -170 -169 -169 -170 ...
## ..$ :List of 1
## .. ..$ : num [1:16, 1:2] -170 -170 -170 -170 -170 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -170 -170 -170 -170 -170 ...
## ..$ :List of 1
## .. ..$ : num [1:8, 1:2] -171 -171 -171 -171 -171 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] -171 -171 -171 -171 -171 ...
## ..$ :List of 1
## .. ..$ : num [1:54, 1:2] -172 -172 -172 -172 -172 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -173 -173 -172 -172 -172 ...
## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr "STATEFP" "STATENS" "AFFGEOID" "GEOID" ...
US.sf <- US.sf %>%
select(STATENS, AFFGEOID, GEOID, STUSPS, NAME, LSAD, ALAND, AWATER, geometry) %>%
rename(State='STUSPS')The geographical data can then be joined to the drug overdose data.
# join by state in order to make map plots
USdeaths <- inner_join(US.sf, drugODbyState, by = "State")More data manipulation will be employed in the Results section as part of several subsequent analyses.
To extract some interesting information, visualizing the overdose death rate across the entire United States can be done by implementing the package tmap. Using a for loop to iterate over the 3 years of interest, a static map for each year can be generated and saved.
# setting up for plotting static maps
for (year in list("2015", "2016", "2017")){
US_cont <- USdeaths %>%
filter(Year == year) %>%
subset(!GEOID %in% c("02", "15", "72")) %>%
simplify_shape(0.2)
US_AK <- USdeaths %>%
filter(Year == year) %>%
subset(GEOID == "02") %>%
simplify_shape(0.2)
US_HI <- USdeaths %>%
filter(Year == year) %>%
subset(GEOID == "15") %>%
simplify_shape(0.2)
US_states <- US_cont %>%
dplyr::select(geometry) %>%
aggregate(by = list(US_cont$State), FUN = mean)
# Contiguous US map
m_cont <- tm_shape(US_cont, projection = 2163) +
tm_polygons("ODrate", title = "Number of Deaths\n(per 100,000)", showNA = FALSE,
border.col = "white", border.alpha = .5,
style = "fixed",
breaks = c(0, 50, 100, 200, 300, 400, 500, 600, 700),
palette = "YlOrRd") +
tm_shape(US_states) +
tm_borders(lwd=1, col = "black", alpha = .5) +
tm_layout(title = paste("Overdose Death Rates in the US (per 100,000),", year),
title.size = 1,
title.position = c("center", "top"),
legend.position = c("right", "bottom"),
legend.width = 0.5,
legend.title.size = 0.8,
legend.text.size = 0.5,
legend.outside = FALSE,
frame = FALSE,
inner.margins = c(0.1, 0.1, 0.1, 0.1))
# Alaska map
m_AK <- tm_shape(US_AK, projection = 3338) +
tm_polygons("ODrate",
border.col = "white",
border.alpha = .5,
style = "fixed",
breaks = c(0, 50, 100, 200, 300, 400, 500, 600, 700),
palette = "YlOrRd") +
tm_layout("Alaska",
legend.show = FALSE,
bg.color = NA,
title.size = 0.8,
frame = FALSE)
# Hawaii map
m_HI <- tm_shape(US_HI, projection = 3759) +
tm_polygons("ODrate",
border.col = "white",
border.alpha = .5,
style = "fixed",
breaks = c(0, 50, 100, 200, 300, 400, 500, 600, 700),
palette = "YlOrRd") +
tm_layout("Hawaii",
legend.show = FALSE,
bg.color = NA,
title.position = c("LEFT", "BOTTOM"),
title.size = 0.8,
frame = FALSE)
# Use grid package to set where AK and HI map should be plotted on the contiguous map
vp_AK <- viewport(x = 0.15, y = 0.15, width = 0.3, height = 0.3)
vp_HI <- viewport(x = 0.4, y = 0.1, width = 0.2, height = 0.1)
# plot map to save individual figures
tmap_save(m_cont,
insets_tm = list(m_AK, m_HI),
insets_vp = list(vp_AK, vp_HI),
filename=paste0("ODinUS", year, ".png"))
graphics.off() # clears after each plot
}The exported .png maps can be re-imported and printed for viewing:
Comparison of these three maps show that in some states (particularly in the Northeast and Midwest region) have increased rates of drug overdoses from the years 2015 to 2017. For these states, it may be important for to allocate more resources to address this issue. Also, virtually no states show a decrease in overdose death rate over time.
Another way to look at the data is by creating a more interactive map. The percentage of deaths attributed to drug overdoses is calculated (dividing the total number of overdose deaths by the total number of deaths) for each state for the years 2015, 2016 and 2017. Then, percent change from the year 2015 to 2017 is calculate. The percent change in overdose death rate can be plotted as a interactive choropleth map using the leaflet package.
## Leaflet Maps
# OD percentage join with shapefile
percentODinUS <- inner_join(US.sf, percentOD, by = "State")
percentChange2017 <- percentODinUS %>%
group_by(State, Year) %>%
filter(Year == "2017")
percentChange2015 <- percentODinUS %>%
group_by(State, Year) %>%
filter(Year == "2015")
percentChange2017$diff <- percentChange2017$percent - percentChange2015$percent
# favorite color palette
pal_fun <- colorNumeric("PuRd", NULL)
# popup message
pu_message <- paste0(percentChange2017$NAME,
"<br>Drug Overdose Rate Change: ",
round(percentChange2017$diff, 2), "%")
# leaflet map OD percent change
leaflet(percentChange2017) %>%
addPolygons(stroke = TRUE, weight=1, color="white",
fillColor = ~pal_fun(diff),
fillOpacity = 1,
popup = pu_message) %>%
setView(lat = 39.8283, lng = -98.5795, zoom = 3) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addLegend("bottomright",
pal=pal_fun,
values=~diff,
title = "Drug Overdose<br>Rate Change,<br>2015 to 2017 (%)",
opacity = 1) %>%
addScaleBar()From this leaflet interactive map, like the static maps above, for the majority of states, a small increase (less than 1%) has occured from 2015 to 2017 in their drug overdose rates. However, some states exhibit a larger increase in the drug overdose rates, in particular, the District of Columbia has seen a 3.69% increase between those two years.
The data can be visualized to show trends of the number of drug overdose deaths for each state by creating a faceted graph of each region of the US for each year. The four regions are as determined from the U.S. Census, the Northeast, Midwest, South, and West. For this, raw overdose death counts will be used. Additionally, using the geom_smooth() function, with method set to auto, a LOESS curve can be plotted through the points. This method is intended to use local regression to fit the curve through points in the scatterplot. It is useful for revealing trends in data.
Using a for loop, similarly as above, we can output the facet grid plots for four regions.
d = data %>%
group_by(State, Month) %>%
filter(Indicator == "Number of Drug Overdose Deaths") %>%
select(State, Year, Month, Data.Value)
NE = c('CT', 'ME', 'MA', 'NH', 'RI', 'VT', 'NJ', 'NY', 'PA')
MW = c('IL', 'IN', 'MI', 'OH', 'WI', 'IA', 'KS', 'MN', 'MO', 'NE', 'ND', 'SD')
S = c('DE', 'DC', 'FL', 'GA', 'MD', 'NC', 'SC', 'VA', 'WV', 'AL', 'KY', 'MI',
'TN', 'AR', 'LA', 'OK', 'TX')
W = c('AZ', 'CO', 'ID', 'MT', 'NV', 'NM', 'UT', 'WY', 'AK', 'CA', 'HI', 'OR', 'WA')
regions = list("Northeast", "Midwest", "South", "West")
i=1
for(states in list(NE, MW, S, W)){
dd = d %>%
filter(State %in% states) %>%
filter(Year != 2018)
dd$Year = factor(dd$Year, levels=c("2015", "2016", "2017"))
dd$Month = factor(dd$Month, levels = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))
ggplot(dd, aes(y=Data.Value, x = Month, colour=State, group=State)) +
geom_line(aes(x=Month, y = Data.Value)) +
geom_point() +
geom_smooth(method = "auto", level=0.99) +
facet_grid( ~ Year ) +
geom_rangeframe() +
theme_minimal() +
theme(axis.text.x = element_text(angle=90, vjust=0.8)) +
labs(title = paste0("Drug Overdose Deaths for the ", regions[i]), x = "Month", y = "Number of Drug Overdose Deaths") +
ylim(low=0, high=6000)
ggsave(paste0(regions[i], ".png"))
i=i+1
}The .png images can be reimported:
By breaking it down by region, we can easily see that this is not a localized phenomenon. And in all US regions, there is at least one state that stands out with a much higher number of overdose deaths than its neighbors. The state with the most deaths for the Northeast is Pennsylvania, for the Midwest is Ohio, for the South is Florida, and for the West is California. The other information these plots provide is the overall trends for each state. Most states hold relatively consistent drug overdose death counts over the entire year, and for the three years in question. However, some states, specificially Pennsylvania, Ohio and Florida have seen large increases in drug overdose death counts.
Can I extract values from the loess regression?
#smooth_vals = predict(loess(is.factor(Month)~Data.Value, dd), dd$Data.Value)ADD DESCRIPTION OF RESULTS HERE
Finally, the drug overdose data can be used to perform some simple forecasting for the future. This is of great interest, because it is necessary to predict how the opioid epidemic will trend spatially with time, for purposes of allocation of public health resources as well as policy changes. To perform forecasting, we will employ the package forecast. Since this data is a time series, to predict future points the autoregressive integrated moving average (ARIMA) model can be used. First, the top 10 states with the highest number of drug overdose deaths are determined. Then, for the top 5, ARIMA and forecasting will be employed to create our predictive model.
# determine top 10 states with most amount of drug overdoses
top_N = 10
topStates = d %>%
group_by(State) %>%
filter(State != "US",
State != "YC") %>%
summarise(Total = sum(Data.Value)) %>%
arrange(desc(Total)) %>%
top_n(top_N) %>%
pull(State)
print(topStates)## [1] CA FL PA OH TX MI IL NY MA NJ
## Levels: AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX US UT VA VT WA WI WV WY YC
top_nd = subset(d, State %in% topStates)
top_nd$Date = sprintf("1%s%d", top_nd$Month, top_nd$Year)
top_nd$Date = as.Date(top_nd$Date, "%d%B%Y")Finally, the top 5 state forecasts can be plotted to show future trends. The 80% and 95% confidence intervals are also displayed on the plots.
CA = top_nd %>% filter(State == 'CA') %>% select(Data.Value)
FL = top_nd %>% filter(State == 'FL') %>% select(Data.Value)
PA = top_nd %>% filter(State == 'PA') %>% select(Data.Value)
OH = top_nd %>% filter(State == 'OH') %>% select(Data.Value)
TX = top_nd %>% filter(State == 'TX') %>% select(Data.Value)
ts(CA$Data.Value, start=c(2015, 1), end=c(2018, 0), frequency = 12) %>%
auto.arima() %>%
forecast(h=20) %>%
autoplot() + ggtitle("California Forecast") + ylab("Number of Drug Overdose Deaths")ts(FL$Data.Value, start=c(2015, 1), end=c(2018, 1), frequency = 12) %>%
auto.arima() %>%
forecast(h=20) %>%
autoplot() + ggtitle("Florida Forecast") + ylab("Number of Drug Overdose Deaths")ts(PA$Data.Value, start=c(2015, 1), end=c(2018, 1), frequency = 12) %>%
auto.arima() %>%
forecast(h=20) %>%
autoplot() + ggtitle("Pennsylvania Forecast") + ylab("Number of Drug Overdose Deaths")ts(OH$Data.Value, start=c(2015, 1), end=c(2018, 1), frequency = 12) %>%
auto.arima() %>%
forecast(h=20) %>%
autoplot() + ggtitle("Ohio Forecast") + ylab("Number of Drug Overdose Deaths")ts(TX$Data.Value, start=c(2015, 1), end=c(2018, 1), frequency = 12) %>%
auto.arima() %>%
forecast(h=20) %>%
autoplot() + ggtitle("Texas Forecast") + ylab("Number of Drug Overdose Deaths")All of these plots of a simple forecasting model with the data we have available indicates that over the next few years, the number of drug overdose deaths will continue to increase. This model doesn’t take into account any other factors, like how individual states are currently responding to the crisis or how doctors are decreasing the rate at which they prescribe opioids. It would be interesting to compare this model to newer data as it is released from the VSRR.
CONCLUSION